Introduction to useful software and workflows

Git: Idea and concept

bg left:40% 75%
bg left:40% 75%

Git: How to use it

Ease of use Efficiency Nerdiness
CLI \(\bullet\circ\circ\circ\circ\) \(\bullet\bullet\circ\circ\circ\) \(\bullet\bullet\bullet\bullet\bullet\)
GUI \(\bullet\bullet\bullet\circ\circ\) \(\bullet\bullet\bullet\circ\circ\) \(\bullet\bullet\circ\circ\circ\)
RStudio \(\bullet\bullet\bullet\circ\circ\) \(\bullet\bullet\bullet\bullet\circ\) \(\bullet\bullet\bullet\circ\circ\)
VSCode \(\bullet\bullet\bullet\bullet\bullet\) \(\bullet\bullet\bullet\bullet\circ\) \(\bullet\bullet\bullet\bullet\circ\)

bg 70%
bg 70%

Git: First and basic steps

$ git config --global user.name <your name>
$ git init <your repository name>
$ git status

$ git add <file-name-1> <file-name-2> OR --all
$ git commit -m “<commit-message>”
OR BOTH IN ONE
$ git commit -am “<commit-message>”

GitHub: Remote and cooperative workflow

$ git clone <git-repo-url>

$ git branch < branch-name>
$ git checkout <name-of-your-branch>
OR BOTH IN ONE
$ git checkout -b <name-of-your-branch>

$ git push

& git fetch
& git merge <branch-name>
OR BOTH IN ONE
$ git pull <remote> (<branch-name>)

$ git fork 

GitHub: Credentials

GitHub: .gitignore

How it works: text file with folder names and files (patterns) not to track

Use case: sensitive data; temp and old files; big data files; outputs \(\rightarrow\) usually track just plain text files (e.g. R scripts, TeX source, etc.)

Get startet: 2 approaches (online tool creates .gitignore content for you)

/data/old
passworts.txt
*.doc
------OR------
/*
!.gitignore
!/scripts

Git: In RStudio

bg left:40% 90%
bg left:40% 90%
  1. you need to start a new project in RStudio (clone from repository): File \(\rightarrow\) New Project \(\rightarrow\) Version Control \(\rightarrow\) Git \(\rightarrow\) Add URL and Folder
  2. new in upper right corner of the screen: Git
  3. add, commit and push your changes directly in RStudio to GitHub

Git: Resources

Exercise: Course material

bg left:40% 90%
bg left:40% 90%
  1. find in groups
  2. install git & create a GitHub account
  3. become collaborator (tell me your username)
  4. clone our course material repository
  5. add a personal folder and test file in exercises
  6. push this changes to the remote repository
  7. pull changes of the other participants

Installation: Quarto

On Tuesday and Wednesday, we are going to use Quarto Markdown Documents, instead of R scripts. Quarto should be pre-installed in RStudio. Please check whether it is by opening the file “day1_r_git/quarto_testfile.qmd” with RStudio.

Also make sure that you see the “Source” and “Visual” buttons in the top left (see image).

If it is not installed, please update your RStudio version!

Installation: RSelenium

You will need to follow the steps described in this Video.

For Everyone:

For Apple users:


install.packages(c("RSelenium", "wdman", "netstat", "binman"))

library(RSelenium)
library(wdman)

selenium()

selenium_object <- selenium(retcommand = TRUE,
                            check = FALSE)

binman::list_versions("chromedriver")

# The following command should open a browser window (you might need to adjust the version!)
remote_driver <- rsDriver(browser = "chrome",
                          chromever = "126.0.6478.127",
                          verbose = FALSE,
                          port = free_port())

                       
# close the server
remote_driver$server$stop()

# If you start it a few times, but never close the server there might be no empty port left.
# You can run the following to kill all java processes
system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)

If you manage to start your chrome browser with the above script, RSelenium is installed properly.

Installion: Python Anaconda

Decide which one to download: - Anaconda (extensive and effortless) - Miniconda (slim and customizable)

Installation: Transformers

Install python package transformers with conda package manager:

  1. open anaconda/miniconda prompt
  2. conda install transformers
  3. check installed packages: conda list

If you use Anaconda, you can also try the GUI “Anaconda Navigator”.

Code Editor: VSCode

h:530 drop-shadow:0,10px,20px,rgba(0,0,0,.4)
h:530 drop-shadow:0,10px,20px,rgba(0,0,0,.4)

Code Editor: Benefits

Code Editor: Resources

Best practice: Folder and file structure

  1. use separate folders for scripts, data, output, and reports
  2. if to many files (~10), use subfolders
  3. separate raw data files from processed data
  4. use clear and consistent names for script, data, and output files:
    • numbering, lowercase, connect words with underscores or hyphens
    • if date is necessary, put at the end, sort by YYYYMMDD
  5. multiple script files for different (sub) tasks (max 100 lines)

Best practice: Efficient R scripts

  1. define libraries, default variables, source code at top of script
  2. comment and structure sections (# —- headline —-)
  3. use pipe operator |> (magritter: %>%) for combining functions
  4. use indentations and spaces for readability
  5. max line length of 80 characters
  6. DRY - use lists, lapply, vectorization, and functions
  7. use relative paths for data and output
  8. avoid hard coded subsetting and indexing

Best practice: Resources